多模式意图识别是理解现实世界中人类语言的重要任务。大多数现有意图识别方法在利用基准数据集的限制中利用多模式信息的局限性,仅使用文本信息。本文介绍了一个用于多模式意图识别(MinTreec)的新型数据集,以解决此问题。它根据电视系列超市收集的数据制定了粗粒和细粒度的分类法。该数据集由2,224个具有文本,视频和音频模式的高质量样本组成,并在二十个意图类别中具有多模式注释。此外,我们在每个视频段中提供带注释的扬声器框架框,并实现扬声器注释的自动过程。 MinTrec对研究人员有助于挖掘不同方式之间的关系,以增强意图识别的能力。我们通过适应三种强大的多模式融合方法来构建基准,从每种模式和模型跨模式相互作用中提取特征。广泛的实验表明,采用非语言方式与仅文本模式相比,实现了实质性改进,这表明使用多模式信息进行意图识别的有效性。表现最佳的方法与人类之间的差距表明了这项任务对社区的挑战和重要性。完整的数据集和代码可在https://github.com/thuiar/mintrec上使用。
translated by 谷歌翻译
背景:心肌灌注SPECT(MPS)对左心室(LV)功能的评估依赖于准确的心肌分割。本文的目的是开发和验证一种新的方法,该方法将深度学习与形状先验结合在一起,以精确提取LV心肌以自动测量LV功能参数。方法:开发了与形状变形模块集成三维(3D)V-NET的分割体系结构。使用动态编程(DP)算法生成的形状先验,然后在模型训练期间限制并指导模型输出,以快速收敛和改善性能。分层的5倍交叉验证用于训练和验证我们的模型。结果:我们提出的方法的结果与地面真理的结果一致。我们提出的模型的骰子相似性系数(DSC)为0.9573(0.0244),0.9821(0.0137)和0.9903(0.0041),Hausdorff距离(HD)6.7529(2.7334)(2.7334)mm,7.2507(3.2507(3.1952)MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM MM,和7.6122 3.0134)MM分别提取心内膜,心肌和心外膜。结论:我们提出的方法在提取LV心肌轮廓和评估LV功能方面具有很高的精度。
translated by 谷歌翻译
公开意图检测是自然语言理解中的一个重大问题,旨在以仅知道已知意图的先验知识来检测看不见的公开意图。当前方法在此任务中面临两个核心挑战。一方面,他们在学习友好表示方面有局限性来检测公开意图。另一方面,缺乏有效的方法来获得已知意图的特定和紧凑的决策边界。为了解决这些问题,本文介绍了一个原始框架DA-ADB,该框架连续学习了远距离感知的意图表示和自适应决策边界,以进行开放意图检测。具体而言,我们首先利用距离信息来增强意图表示的区别能力。然后,我们设计了一种新颖的损失函数,以通过平衡经验和开放空间风险来获得适当的决策界限。广泛的实验显示了距离了解和边界学习策略的有效性。与最先进的方法相比,我们的方法在三个基准数据集上实现了重大改进。它还具有不同比例的标记数据和已知类别的稳健性能。完整的数据和代码可在https://github.com/thuiar/textoir上获得
translated by 谷歌翻译
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译
The mainstream workflow of image recognition applications is first training one global model on the cloud for a wide range of classes and then serving numerous clients, each with heterogeneous images from a small subset of classes to be recognized. From the cloud-client discrepancies on the range of image classes, the recognition model is desired to have strong adaptiveness, intuitively by concentrating the focus on each individual client's local dynamic class subset, while incurring negligible overhead. In this work, we propose to plug a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models, requiring only one-time cloud-based training to be client-adaptive. In particular, given a target image from a certain client, ICIIA introduces multi-head self-attention to retrieve relevant images from the client's historical unlabeled images, thereby calibrating the focus and the recognition result. Further considering that ICIIA's overhead is dominated by linear projection, we propose partitioned linear projection with feature shuffling for replacement and allow increasing the number of partitions to dramatically improve efficiency without scarifying too much accuracy. We finally evaluate ICIIA using 3 different recognition tasks with 9 backbone models over 5 representative datasets. Extensive evaluation results demonstrate the effectiveness and efficiency of ICIIA. Specifically, for ImageNet-1K with the backbone models of MobileNetV3-L and Swin-B, ICIIA can improve the testing accuracy to 83.37% (+8.11%) and 88.86% (+5.28%), while adding only 1.62% and 0.02% of FLOPs, respectively.
translated by 谷歌翻译
Compared with network datasets, multi-dimensional data are much more common nowadays. If we can model multi-dimensional datasets into networks with accurate network properties, while, in the meantime, preserving the original dataset features, we can not only explore the dataset dynamic but also acquire abundant synthetic network data. This paper proposed a fast scale-free network model for large-scale multi-dimensional data not limited to the network domain. The proposed network model is dynamic and able to generate scale-free graphs within linear time regardless of the scale or field of the modeled dataset. We further argued that in a dynamic network where edge-generation probability represents influence, as the network evolves, that influence also decays. We demonstrated how this influence decay phenomenon is reflected in our model and provided a case study using the Global Terrorism Database.
translated by 谷歌翻译
In recent years, aerial swarm technology has developed rapidly. In order to accomplish a fully autonomous aerial swarm, a key technology is decentralized and distributed collaborative SLAM (CSLAM) for aerial swarms, which estimates the relative pose and the consistent global trajectories. In this paper, we propose $D^2$SLAM: a decentralized and distributed ($D^2$) collaborative SLAM algorithm. This algorithm has high local accuracy and global consistency, and the distributed architecture allows it to scale up. $D^2$SLAM covers swarm state estimation in two scenarios: near-field state estimation for high real-time accuracy at close range and far-field state estimation for globally consistent trajectories estimation at the long-range between UAVs. Distributed optimization algorithms are adopted as the backend to achieve the $D^2$ goal. $D^2$SLAM is robust to transient loss of communication, network delays, and other factors. Thanks to the flexible architecture, $D^2$SLAM has the potential of applying in various scenarios.
translated by 谷歌翻译
尽管使用多个无人机(UAV)具有快速自主探索的巨大潜力,但它的关注程度很少。在本文中,我们提出了赛车手,这是一种使用分散无人机的舰队的快速协作探索方法。为了有效派遣无人机,使用了基于在线HGRID空间分解的成对交互。它可确保仅使用异步和有限的通信同时探索不同的区域。此外,我们优化了未知空间的覆盖路径,并通过电容的车辆路由问题(CVRP)配方平衡分区到每个UAV的工作负载。鉴于任务分配,每个无人机都会不断更新覆盖路径,并逐步提取关键信息以支持探索计划。分层规划师可以找到探索路径,完善本地观点并生成序列的最小时间轨迹,以敏捷,安全地探索未知空间。对所提出的方法进行了广泛的评估,显示出较高的勘探效率,可伸缩性和对有限交流的鲁棒性。此外,我们第一次与现实世界中的多个无人机进行了完全分散的协作探索。我们将作为开源软件包发布实施。
translated by 谷歌翻译
作为自动驾驶系统的核心部分,运动计划已受到学术界和行业的广泛关注。但是,由于非体力学动力学,尤其是在存在非结构化的环境和动态障碍的情况下,没有能够有效的轨迹计划解决方案能够为空间周期关节优化。为了弥合差距,我们提出了一种多功能和实时轨迹优化方法,该方法可以在任意约束下使用完整的车辆模型生成高质量的可行轨迹。通过利用类似汽车的机器人的差异平坦性能,我们使用平坦的输出来分析所有可行性约束,以简化轨迹计划问题。此外,通过全尺寸多边形实现避免障碍物,以产生较少的保守轨迹,并具有安全保证,尤其是在紧密约束的空间中。我们通过最先进的方法介绍了全面的基准测试,这证明了所提出的方法在效率和轨迹质量方面的重要性。现实世界实验验证了我们算法的实用性。我们将发布我们的代码作为开源软件包,目的是参考研究社区。
translated by 谷歌翻译
许多顺序决策问题可以作为自适应的下管最大化问题。但是,该领域中的大多数现有研究都集中在基于池的设置上,在该设置中,人们可以按任何顺序选择项目,而对于基于流的设置,项目以任意顺序到达,并且必须立即确定是否可以立即决定在到达时选择或不选择项目。在本文中,我们介绍了一类新的实用程序功能,即半准时函数。我们开发了一系列有效的算法,以最大程度地提高基于流的设置下的半脉冲下函数。
translated by 谷歌翻译